owidRIn this note, we study the R package
owidRfor importing data from Our World in Data.
The package official site contains other links. When you quote the package, use the link to the official site.
In general, README gives a short introduction to the package, a Manual, the comprehensive descriptions of each function, and a Vignette, a practical introduction containing examples and applications.
This package acts as an interface to Our World in Data datasets, allowing for an easy way to search through data used in over 3,000 charts and load them into the R environment.
Run the following for the first time
install.packages("owidR")library(owidR)
library(tidyverse) The package automatically load a part of tidyverse,
e.g., dplyr, ggplot, …. Since it works well
with the schemetidyverse, it is better to load
tidyverse with it.
The creator of this package also suggests loading packages
plm for panels of data, and texreg for
displaying models, but let us start without them until we actually use
them. For panel data, see, for example, the site](https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/).
In this package, chart is close to data, and
chart id is a data indicator.
owid_searchSearch the data sources used in OWID charts
owid_search(term)
Since the output is long, I cut it off to the first six rows using
head().
owid_search("emissions") %>% head() titles
[1,] "Air pollutant emissions"
[2,] "Emissions of air pollutants"
[3,] "Emissions of air pollutants"
[4,] "Emissions of particulate matter"
[5,] "Global SO₂ emissions"
[6,] "Global sulphur dioxide (SO₂) emissions by world region"
chart_id
[1,] "air-pollutant-emissions"
[2,] "emissions-of-air-pollutants"
[3,] "emissions-of-air-pollutants-oecd"
[4,] "emissions-of-particulate-matter"
[5,] "global-so-emissions"
[6,] "so-emissions-by-world-region-in-million-tonnes"
A matrix is returned. If the list is long, it is easier to see the
pairs of the titles and chart_ids by adding
as_tibble().
owid_search("emissions") %>% as_tibble()If the list is not long, you do not need to add
as_tibble(). However, note that you need to keep in mind
that the title and the chart_id consists of a pair, and you need to use
the chart_id to download the data using owid.
owid_search("human rights") titles
[1,] "Human rights vs. electoral democracy"
[2,] "Countries with accredited independent national human rights institutions"
[3,] "Distribution of human rights"
[4,] "Human rights"
[5,] "Human rights vs. GDP per capita"
[6,] "Human rights weighted by population"
[7,] "Number of cases of killed human rights defenders, journalists and trade unionists"
[8,] "Share of countries with accredited independent national human rights institutions"
chart_id
[1,] "human-rights-vs-electoral-democracy"
[2,] "countries-with-independent-national-human-rights-institution"
[3,] "distribution-human-rights-vdem"
[4,] "human-rights-vdem"
[5,] "human-rights-vs-gdp-per-capita"
[6,] "human-rights-popw"
[7,] "cases-of-killed-human-rights-defenders-journalists-trade-unionists"
[8,] "share-countries-accredited-independent-national-human-rights-institutions"
owidGet a dataset used in an OWID chart
owid(chart_id = NULL, rename = NULL, tidy.date = TRUE, ...)
chard_id: The chart_id as returned by owid_search, which
is combined with ‘-’. Don’t mix up with the chart titles.
rename: Rename the value column. Currently only works if
their is just one value col- umn.
emissions <- owid("per-capita-ghg-emissions")emissionsrights <- owid("human-rights-scores")rightsNote.
rename to change column names. For
example,owid("per-capita-ghg-emissions", rename = "ghgPcap")dpyr::rename. In the next
example, I used Total including LUCF. However, ‘Total
including LUCF’ and “Total including LUCF” work as well.emissions %>% rename(ghgPcap = `Total including LUCF`)top_n(1) is same as slice(1),
and gives the first row only.owid("electoral-democracy") %>% top_n(1)Selecting by electdem_vdem_low_owid
owid("electoral-democracy", rename = c("electoral_democracy", "vdem_high", "vdem_low"))dplyr::rename, and keep the record of
renaming column names.(democracy <- owid("electoral-democracy"))democracy <- democracy %>%
rename(`electoral_democracy` = `Electoral democracy`,
`vdem_high` = `electdem_vdem_high_owid`,
`electdem_vdem_low_owid` = `electdem_vdem_low_owid`)
democracyowid_sourceA function to get source information from an OWID dataset and display it in the R console.
owid_source(data)
owid_source(emissions)Dataset Name: Our World in Data based on Climate Analysis Indicators Tool (CAIT).
Published By: CAIT Climate Data Explorer via Climate Watch
Link: https://www.climatewatchdata.org/data-explorer/historical-emissions
Emissions are measured in tonnes of carbon dioxide equivalents (CO₂e), based on 100-year global warming potential factors for non-CO₂ gases.
Emissions are broken down by sector. Further information on sector definitions is available <a href="https://ourworldindata.org/ghg-emissions-by-sector">here</a>.
owid_source(rights)Dataset Name: Fariss et al. (2020)
Published By: Fariss CJ, Kenwick MR, Reuning K. Estimating one-sided-killings from a robust measurement model of human rights. Journal of Peace Research. 2020;57(6):801-814. doi:10.1177/0022343320965670
Link: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/RQ85GK
This dataset provides the human rights protection scores by Fariss et al. (2020), first developed by Schnakenberg and Fariss (2014).
You can download the code and complete dataset, including supplementary variables, from GitHub: https://github.com/owid/notebooks/tree/main/BastianHerre/human_rights
view_chartA function that opens the original OWID chart in your browser
view_chart(x)
x Either a tibble returned by owid(),
or a chart_id.
Example
The first one uses the chart, i.e., the tibble returned by
owid(), and the second, chart_id. You can also
embed in your R Markdown file by copying Embed iframe clink
from Share botton at the bottom right corner.
firearm_suicide <- owid("suicide-rate-by-firearm")
view_chart(firearm_suicide)view_chart("electoral-democracy")view_chart("share-of-individuals-using-the-internet")pal_owidColour palettes based on the colours used by Our World in Data
owid_covidowid_covid: Get the Our World in Data covid-19 dataset
owid_covid()
See the detail at the GitHub site.
covid <- owid_covid()covid %>% filter(location == "Japan")The following is based on the presentation and the first two R Notebook files created by Professor Kaizoji.
Lets use the core functions to get data on how human rights have changed over time. First by searching for charts on human rights.
owid_search("human rights") %>% as_tibble()Let’s use the human rights protection dataset.
rights <- owid("human-rights-protection")
rightsggplot2 makes it easy to visualise our data.
rights %>%
filter(entity %in% c("United Kingdom", "France", "United States", "Japan")) %>%
ggplot(aes(year, `Human rights protection`, colour = entity)) +
geom_line()owid_search("internet") %>% as_tibble()Get a dataset used in an OWID chart.
internet <- owid("share-of-individuals-using-the-internet", rename = "internet_use")
internetGet source information on an OWID dataset
owid_source(internet)Dataset Name: International Telecommunication Union (via World Bank)
Published By: World Development Indicators - World Bank (2022.05.26)
Link: https://datacatalog.worldbank.org/search/dataset/0037712/World-Development-Indicators
A function that opens the original OWID chart in your browser.
view_chart(internet)Plot an owid dataset
owid_plot(internet, filter = "World") +
labs(title = "Share of the World Population using the Internet") +
scale_y_continuous(limits = c(0, 100))+
theme_owid()Loading required namespace: showtext
Failed with error: ‘there is no package called ‘showtext’’
Warning: importing fonts requires the showtext pacakgeLoading required namespace: showtext
Failed with error: ‘there is no package called ‘showtext’’
Warning: importing fonts requires the showtext pacakge
owid_plot(internet, summarise = FALSE, filter = c("United Kingdom", "Spain", "Russia", "Egypt", "Nigeria")) +
labs(title = "Share of Population with Using the Internet") +
scale_y_continuous(limits = c(0, 100), labels = scales::label_number(suffix = "%")) # The labels argument allows you to make it clear that the value is a percentageLoading required namespace: showtext
Failed with error: ‘there is no package called ‘showtext’’
Warning: importing fonts requires the showtext pacakgeWarning: 'scale_colour_owid' is deprecated.
See help("Deprecated")
creating a choropleth map
owid_map(internet, year = 2017) +
labs(title = "Share of Population Using the Internet, 2017")Loading required namespace: showtext
Failed with error: ‘there is no package called ‘showtext’’
Warning: importing fonts requires the showtext pacakge
owid_search("democrac") %>% as_tibble()democracy <- owid("electoral-democracy", rename = c("electoral_democracy", "vdem_high", "vdem_low"))
democracyowid_source(democracy)Value:
Dataset Name: OWID based on V-Dem (v12) and Lührmann et al. (2018)
Published By: Our World in Data, Bastian Herre
Link: http://v-dem.net/vdemds.html
This dataset provides information on democracy and human rights, using data from the Varieties of Democracy project (v12), and the Regimes of the World classification by Lührmann et al. (2018).
We expand the countries and years covered, and refine the coding of the Regimes of the World classification. You can read a detailed description of the data in these posts:
https://ourworldindata.org/regimes-of-the-world-data
https://ourworldindata.org/vdem-electoral-democracy-data
https://ourworldindata.org/vdem-human-rights-data
You can download the code and complete dataset, including supplementary variables, from GitHub: https://github.com/owid/notebooks/tree/main/BastianHerre/democracyValue:
Dataset Name: OWID based on V-Dem (v12) and Lührmann et al. (2018)
Published By: Our World in Data, Bastian Herre
Link: http://v-dem.net/vdemds.html
This dataset provides information on democracy and human rights, using data from the Varieties of Democracy project (v12), and the Regimes of the World classification by Lührmann et al. (2018).
We expand the countries and years covered, and refine the coding of the Regimes of the World classification. You can read a detailed description of the data in these posts:
https://ourworldindata.org/regimes-of-the-world-data
https://ourworldindata.org/vdem-electoral-democracy-data
https://ourworldindata.org/vdem-human-rights-data
You can download the code and complete dataset, including supplementary variables, from GitHub: https://github.com/owid/notebooks/tree/main/BastianHerre/democracyValue:
Dataset Name: OWID based on V-Dem (v12) and Lührmann et al. (2018)
Published By: Our World in Data, Bastian Herre
Link: http://v-dem.net/vdemds.html
This dataset provides information on democracy and human rights, using data from the Varieties of Democracy project (v12), and the Regimes of the World classification by Lührmann et al. (2018).
We expand the countries and years covered, and refine the coding of the Regimes of the World classification. You can read a detailed description of the data in these posts:
https://ourworldindata.org/regimes-of-the-world-data
https://ourworldindata.org/vdem-electoral-democracy-data
https://ourworldindata.org/vdem-human-rights-data
You can download the code and complete dataset, including supplementary variables, from GitHub: https://github.com/owid/notebooks/tree/main/BastianHerre/democracy
owid_map(democracy, year = 2015, palette = "YlGn") +
labs(title = "Electoral Democracy")Loading required namespace: showtext
Failed with error: ‘there is no package called ‘showtext’’
Warning: importing fonts requires the showtext pacakge
owid_plot(democracy, summarise = FALSE, filter = c("United Kingdom", "Spain", "Russia", "Egypt", "Nigeria")) +
labs(title = "electoral-democracy") +
scale_y_continuous(limits = c(0, 1), labels = scales::label_number(suffix = "%")) # The labels argument allows you to make it clear that the value is a percentageLoading required namespace: showtext
Failed with error: ‘there is no package called ‘showtext’’
Warning: importing fonts requires the showtext pacakgeWarning: 'scale_colour_owid' is deprecated.
See help("Deprecated")
#> Warning: 'scale_colour_owid' is deprecated.
#> See help("Deprecated")gdp <- owid("gdp-per-capita-worldbank", rename = "gdp")
gov_exp <- owid("total-gov-expenditure-gdp-wdi", rename = "gov_exp")
age_dep <- owid("age-dependency-ratio-of-working-age-population", rename = "age_dep")
unemployment <- owid("unemployment-rate", rename = "unemp")Mutating joins
left_join(): includes all rows in x.
References
data <- internet %>%
left_join(democracy) %>%
left_join(gdp) %>%
left_join(gov_exp) %>%
left_join(age_dep) %>%
left_join(unemployment)Joining, by = c("entity", "code", "year")Joining, by = c("entity", "code", "year")Joining, by = c("entity", "code", "year")Joining, by = c("entity", "code", "year")Joining, by = c("entity", "code", "year")
Drawing scatter plot
data %>%
filter(year == 2015) %>%
ggplot(aes(internet_use, electoral_democracy)) +
geom_point(colour = "#57677D", na.rm = TRUE) +
geom_smooth(method = "lm", colour = "#DC5E78", na.rm = TRUE) +
labs(title = "Relationship Between Internet Use and electoral_democracy", x = "Internet Use", y = "electoral_democracy") +
theme_owid()Failed with error: ‘there is no package called ‘showtext’’
data %>%
filter(year == 2015) %>%
ggplot(aes(gdp, internet_use)) +
geom_point(colour = "blue") +
geom_smooth(method = "gam", colour = "red", level = 0.0) +
labs(title = "Relationship Between Internet Use and GDP", x = "GDP", y = "Internet Use")model1 <- lm(electoral_democracy ~ internet_use, data)
summary(model1)
Call:
lm(formula = electoral_democracy ~ internet_use, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.76097 -0.20192 0.02248 0.18764 0.48558
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4283461 0.0042829 100.01 <2e-16 ***
internet_use 0.0035522 0.0001166 30.46 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2475 on 5311 degrees of freedom
(1347 observations deleted due to missingness)
Multiple R-squared: 0.1487, Adjusted R-squared: 0.1485
F-statistic: 927.7 on 1 and 5311 DF, p-value: < 2.2e-16
model2 <- lm(gdp ~ internet_use, data)
summary(model2)
Call:
lm(formula = gdp ~ internet_use, data = data)
Residuals:
Min 1Q Median 3Q Max
-35872 -7670 -4496 3306 126683
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8101.982 276.503 29.30 <2e-16 ***
internet_use 413.174 7.315 56.48 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16350 on 5872 degrees of freedom
(786 observations deleted due to missingness)
Multiple R-squared: 0.352, Adjusted R-squared: 0.3519
F-statistic: 3190 on 1 and 5872 DF, p-value: < 2.2e-16
Creating a table of the results of the regression analysis using
texreg. For the first time, install the pacage
texreg.
install.packages("texreg")library(texreg)models <- list("Model 1" = model1,
"Model 2" = model2)
screenreg(models, stars = NULL)
================================
Model 1 Model 2
--------------------------------
(Intercept) 0.43 8101.98
(0.00) (276.50)
internet_use 0.00 413.17
(0.00) (7.32)
--------------------------------
R^2 0.15 0.35
Adj. R^2 0.15 0.35
Num. obs. 5313 5874
================================